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Abstract 

We present a procedure for efficiently compressing astronomical radio data for high performance applications. Inte¬ 
grated, post-correlation data are first passed through a nearly lossless rounding step which compares the precision of the 
data to a generalized and calibration-independent form of the radiometer equation. This allows the precision of the data 
to be reduced in a way that has an insignificant impact on the data. The newly developed Bitshuffle lossless com¬ 
pression algorithm is subsequently applied. When the algorithm is used in conjunction with the HDF5 library and data 
format, data produced by the CHIME Pathfinder telescope is compressed to 28% of its original size and decompression 
throughputs in excess of 1 GB/s are obtained on a single core. 
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1. Introduction 

The simultaneous drives to wider fields and higher sen¬ 
sitivity have led radio astronomy to the cusp of a big-data 
revolution. There is a multitude of instruments, including 
21cm cosmology experiments VM, Square Kilometer Ar¬ 
ray Precursors [T0UI21 , and ultimately the Square Kilome¬ 
ter Array m, whose rate of data production will be orders 
of magnitude higher than any existing radio telescope. An 
early example is the CHIME Pathfinder mum which will 
soon be producing data at a steady rate of over 4 TB per 
day. The cost associated with storing and handling these 
data can be considerable and therefore it is desirable to 
reduce the size of the data as much as possible using com¬ 
pression. At the same time, these data volumes produce a 
significant data processing challenge. Any data compres¬ 
sion/decompression scheme must be fast enough as to not 
hinder data processing, and would ideally lead to a net 
increase in performance due to the reduced time required 
to read the data from disk. 

Here, after discussing some general considerations for 
designing data storage formats in Section [2j we present 
a scheme for compressing astronomical radio data. Our 
procedure has two steps: a controlled (relative to thermal 
noise) reduction of the precision of the data which reduces 
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its information entropy (Section 3j), and a lossless com¬ 
pression algorithm- Bitshuffla* -which exploits this re¬ 
duction in entropy to achieve a very high compression ra¬ 
tio (Section [4]). These two steps are independent in that, 
while they work very well together, either of them can be 
used without the other. When we evaluate our method in 
Section [5] we show that the precision reduction improves 
compression ratios for most lossless compressors. Likewise, 
Bitshuffle outperforms most other lossless compressors 
even in the absence of precision reduction. 

2. Considerations for designing data storage for¬ 
mats 

2.1. Characteristics of radio-astronomy data and usage pat¬ 
terns 

Integrated, post-correlation radio-astronomy data are 
typically at least three dimensional, containing axes repre¬ 
senting spectral frequency, correlation product, and timely] 
The correlation product refers to the correlation of all an¬ 
tenna input pairs, including auto-correlations and cross¬ 
correlations between different polarisations from the same 


1 https://github.com/kiyo-masui/bitshuffle 
2 A fourth axis is often introduced when data are ‘folded’ or 
‘gated’—i.e., if data from the on- and off-periods of a switched, cal¬ 
ibration noise source are accumulated separately, or pulsar data is 
folded on the pulsar’s period which is divided into many gates. 
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antenna. In a single dish these form the polarization chan¬ 
nels for each beam and in an interferometer these are the 
visibilities. This also applies to beam forming interfer¬ 
ometers, where linear combinations of antenna inputs are 
formed (either in analog or digitally) before correlation. 

The CHIME collaboration determined that its data are 
most commonly accessed along the time axis. That is, it 
is generally most efficient for the axis representing time to 
be the fastest varying once loaded into memory. This is 
the case for noise characterisation, radio-frequency inter¬ 
ference (RFI) flagging and system-health monitoring, to 
name a few. Most importantly, the map-making pipeline 
typically produces maps on a per-frequency basis and is 
most efficient at processing time-contiguous data. Though 
it is sometimes necessary to work with spectra (slices along 
the frequency axis) or ‘correlation triangles’ (slices along 
the correlation product axis), we find that these use cases 
normally only involve a few slices and large I/O operations 
in these spaces are rare. 

Of course, the CHIME collaboration’s preference for 
the time axis to be the fastest varying will not apply to 
all consumers of radio data. One expects that access pat¬ 
terns might vary considerably for the diverse applications 
of radio data, including spectroscopy, synthesis imaging, 
and pulsar timing. But as discussed below, arranging data 
with time as the fastest varying index is beneficial for data 
compression. 

2.2. Compression: benefits and requirements 

Compression can greatly ease the burden of storing and 
handling large data sets, but there are also performance 
benefits. Compression algorithms exist whose decompres¬ 
sion cost is negligible compared to the cost of reading from 
disk. As we will show, data may be compressed by up to 
a factor of four in some cases. As such, the time required 
to load a dataset from disk into memory may be reduced 
by a factor of four using compression. 

We previously stated that ordering data with the axis 
representing time as the fastest varying is most efficient 
for the majority of I/O operations. This ordering is also 
beneficial for compression, since adjacent data points are 
likely to be highly correlated, presuming that the cadence 
is such that the spatially-smoothly varying sky is Nyquist 
sampled. On the other hand, it is most natural to record 
data with time as the slowest varying index since that is 
the order in which they are generated by the instrument. 
To have time as the fastest varying index, the data must ei¬ 
ther be buffered in memory (which is impractical), written 
with strided writes (which is inefficient) or reordered after 
acquisition. Since the data are acquired and written only 
once but read many times, it is logical to prioritise read- 
performance over write-performance. Thus, the CHIME 
collaboration deemed a post-acquisition reordering step to 
be worthwhile. 

The same argument can be used to prioritise data de¬ 
compression speed over compression speed. Compression 
is sufficiently cheap computationally that even a modest 


number of processors should be able to keep up with the 
acquisition rate of CHIME Pathfinder data (which will be 
~ 50 MiB/s depending on runtime parameters) for almost 
any compression algorithm. Even if this were not the case, 
data could be compressed in parallel post acquisition. On 
the other hand, one might wish to load several days of ac¬ 
quired data at once for analysis, and ideally, this would be 
bound only by disk read times, not decompression speed. 

The decompression cost may not be negligible com¬ 
pared to read times for files that are cached in memory 
or stored on high-performance parallel file systems. This 
makes it desirable to have as fast a decompression scheme 
as possible as the benefits of speed are not always limited 
by hard drive access. A multi-threaded implementation of 
the decompression algorithm can thereby result in a sig¬ 
nificant speed up on multi-core systems. 

To summarise all of the foregoing, the following re¬ 
quirements for a compression scheme emerge: 


Unbiased Any lossy compression must not bias the data 
in any way. 

Nearly lossless Any lossy compression employed must 
be controlled in a manner that is guaranteed not to 
significantly decrease the sensitivity of the data. 

Time minor In the multi-dimensional dataset, the axis 
representing time should be the fastest varying. This 
allows for the efficient reading of small subsets of 
spectral frequencies and correlation products but for 
large periods of time. 

Fast decompression To realize the performance gains 
associated with compressing the data, we require the 
time to decompress the data to be small compared 
to the time required to read the data from disk. At 
the time of writing, a single hard drive can typically 
be read at a rate of ~ lOOMiB/s. As such, a com¬ 
pression algorithm with throughput of ~ 1 GiB/s on 
a single processor is desirable. 

Threaded When using a parallel file system, or when the 
file is cached in system memory, reading throughputs 
can be much higher compared to when using a single 
hard disk. For decompression to not degrade perfor¬ 
mance in these cases, the compression library should 
be threaded. 


Thread-safe While the HDF5 library (see Section 2.3 


below) is not internally threaded, it may become so 
in the future. In addition, programs may attempt 
to hide the cost of 10 operations by putting them 
in a separate thread. The compression library must 
therefore be thread-safe. 


2.3. HDF5 and chunked storage 

The Hierarchical Data Format 5 (HDF5) [TO] is a widely 
used data format in astronomy, capable of storing and or¬ 
ganizing large amounts of numerical data. In the context 
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of this paper, it also has the benefit of allowing for ‘chun¬ 
ked storage’. That is, an HDF5 ‘dataset’—a multidimen¬ 
sional array of homogeneous type—can be broken up into 
subsets of fixed size, called chunks, which are stored in any 
order on disk, with locations recorded in a look-up table. 
This is in contrast to contiguous storage, in which ran¬ 
dom access is obviously trivial. The advantage of chunked 
storage for our purposes is that though the number of el¬ 
ements in a chunk is fixed, its size is not, and as such it 
may readily be compressed. 

The primary drawback of chunked storage is that full 
chunks must be read from disk at a time. As such, to read 
a single array element, the full chunk containing thousands 
of elements must be read. In practice, this is mitigated by 
the fact that hard disk latencies are very long compared 
to the time required to read data from disk. Typically a 
chunk of several hundred KiB can be read from disk in 
only twice the time required to read a single element of a 
contiguous dataset, and thus the cost of chunked storage 
for random access is at most a factor of two, as long as 
chunk sizes are chosen appropriately. In addition, requir¬ 
ing a large number of random accesses to a dataset is a 
rare usage case for radio data, and therefore random ac¬ 
cess performance should rarely be a driving factor when 
designing a data format. 

HDF5 implements compression of chunked datasets via 
a ‘filter pipeline’. Any number of filters can be specified 
when the dataset is created. When a data chunk is to 
be stored, the buffer is sequentially passed through filter 
functions before being written to disk. When the chunk 
is to be read from disk, it is passed through the inverse 
filter functions in reverse order before being presented to 
the user. Many HDF5 filters are available whose functions 
include lossless and lossy compression, preconditioning fil¬ 
ters aimed at improving compression ratios, and other util¬ 
ities such as data check-sums. The filter-pipeline is also 
extensible, and writing a new filter, such as the one pre¬ 
sented in this paper, is a relatively straightforward task. 

3. Lossy entropy reduction: reduction of precision 

All experiments must perform some amount of lossy 
compression simply by virtue of having to choose a finite 
width data type which reduces precision by truncation. 
Here, we focus on performing a reduction of precision in 
a manner that is both controlled, in that it has a well- 
understood effect on the data; and efficient, in that only 
the required precision is kept allowing for better compres¬ 
sion. 

Reducing the precision of the data involves discarding 
some number of the least significant bits of each data el¬ 
ement whose significance is small compared to the noise. 
For integers, this is accomplished by rounding values to 
a multiple of a power of two. For floating point numbers 
the equivalent operation is performed on the significand 
part of the value, however in this work we focus on in¬ 
teger data. In some cases this may allow for data to fit 


into a smaller data type, for example single-precision (32- 
bit) as opposed to double-precision (64-bit) floating point 
numbers. However, even if this is not the case it is still 
beneficial to identify bits that are well within the noise 
margin and replace them with zeros. Bits that are dom¬ 
inated by noise are essentially random and are thus very 
high entropy. Any subsequent lossless compression has no 
hope of compressing them despite their insignificance. 

Of course for this to be useful, it is necessary for the 
lossless compression step to be able to exploit the reduced 
entropy associated with zeroing the noisy bits. This will 
be discussed in Section [4j 

In this section, we begin with a discussion of noise for 
a general radio dataset. We derive an expression for the 
thermal noise, including possible correlated components 
from the sky (so-called self-noise), that is independent of 
calibration. We then use this estimate to derive an accept¬ 
able level of rounding, specified by the rounding granular¬ 
ity, such that the induced error and thus rounding noise 
is negligible compared to thermal. The final result is a 
procedure for rounding the data elements that maximizes 
the reduction of entropy of the data while constrained to 
fixed loss of sensitivity. 

3.1. Noise and the radiometer equation 

Any discussion of reducing numerical precision must 
necessarily include a discussion of noise, the intrinsic scat¬ 
ter in the data independent of any discretization effects. If 
the error induced by reducing the precision of the data is 
small compared to the scatter from the noise, then the re¬ 
duction will have a negligible effect on the data, assuming 
the error is unbiased. Here, we will focus on thermal noise 
which is present for all sources of incoherent radiation and 
is thus a lower limit on the noise present in the data. Ther¬ 
mal noise causes uncertainty in the measurement of radi¬ 
ation power that is proportional to that power. It is due 
to the stochastic nature of incoherent radiation. Coher¬ 
ent sources of radiation can increase the measured power 
without increasing scatter, however such sources are rare 
in astronomy. Radio-frequency interference, however, may 
be coherent. 

Fortunately the thermal noise can be estimated from 
the data on a sample-by-sample basis, and independently 
of any calibration factors, using the radiometer equation. 
Usually it is assumed that the thermal noise is uncorre¬ 
lated between correlation products, but we will show that 
the noise is in general correlated and will compute the as¬ 
sociated covariance matrix. We will argue that in some 
observational regimes it may be necessary to take these 
correlations into account when reducing the numerical pre¬ 
cision of the data relative to the noise. 

We denote the correlation products in a single spec¬ 
tral bin and in a single time integration as Vij = (a^a*), 
where at is the digitized and Fourier-transformed signal 
from antenna channel i , and the angular brackets are an 
ensemble average which is approximated using a time av¬ 
erage within the integration time. The noise is charac- 
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terized by the covariance matrix of the full set of corre¬ 
lation products, C a ij t p g h- Here, the indices i, j, g , and 
h run over the antenna number and a and /3 run over 
the real and imaginary parts. For example Cn. e ij,imgh = 
(ReVijlmVg^-iReV^ilmVgh). 

It is typically assumed that the cross-correlations be¬ 
tween channels are much smaller than the auto-correlations, 
i.e. VijV*j < VuVjj for i ^ j. This is because it is assumed 
that the total measured power is dominated by noise from 
the amplifiers in the signal chain prior to digitization. In 
this limit, all correlation products are uncorrelated and 
the well known radiometer equations are pm . chap. 6.2] 


^Re ii 
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°Im ii 


v 2 
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~ In’ 
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2 _ ViiVjj 

aij ~ 2 N 
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Here N = A„A t is the number of samples entering the 
integration, and one notes that the auto-correlations, Va, 
are purely real. These can be aggregated into a diagonal 
covariance matrix: 


C'aijjfigh — T ^ihfijg) (1 ^ij^a Im) 


ViiVjj 

2 N 


( 4 ) 


where Sij is the Kronecker delta which is unity for i = j 
and zero otherwise. The first factor in parentheses just 
cancels the factor of two for auto-correlations and accounts 
for the fact that V) ;; = V* { . The second factor in paren¬ 
theses sets the whole expression to zero for the imaginary 
part of the auto-correlations. 

Equation[4]is appropriate in many observational regimes 
and is the equation we use for CHIME data, However, 
there are cases when the correlation products can be highly 
correlated. With the recent increased emphasis on low 
spectral frequencies (where sky brightness typically dom¬ 
inates over amplifier noise) and on close packed arrays 
(where amplifier noise may become coupled between chan¬ 
nels) the approximation that the cross-correlations are small 
may not hold. A more general set of equations describing 
the noise is m- 


C-Reij^egh = Re [Vi g Vj h + Vih,Vj g \ > (5) 

Clmij,Imgh = Re [VigVjh ~ ^ihVjg\ ) (6) 

CReijJmgh = Im [— VigVjh + VihVjg] ■ (7) 

These may be derived from first principles by Wick ex¬ 

panding the four-point correlations of eq in terms of the 
correlation products . 

Below we show some special cases of the above equa¬ 


tions to illustrate how they differ from Equation [4] 
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One would expect that it is the diagonal of this covari¬ 
ance matrix, C a ij jCl ij , to which the error associated with 
precision reduction should be compared. These diagonal 
elements give the variance of a data element marginalized 
over all other elements. It is actually more appropriate to 
compare the error to the unmarginalized variance, that is 
If the truncation error is small compared 
to the marginalized variance then the ability to measure 
an individual visibility will be unaffected, while if the er¬ 
ror is small compared to the unmarginalized variance, then 
the ability to measure any linear combination of correla¬ 
tion products will be unaffected. The two expressions are 
equivalent if using Equation [4] for the covariance but may 
differ significantly when the data are dominated by the sky 
instead of receiver-noise. 

As an illustrative example of how the noise can be 
highly correlated between correlation products, consider 
the case of an interferometer where the visibilities are dom¬ 
inated by a single unresolved point source. For simplicity, 
we assume the source is at zenith and that the gains and 
phases of all inputs are calibrated, although our argument 
does not depend on this. For these conditions, the real 
parts of all visibilities are equal and proportional to the 
source flux, while the imaginary parts are zero. As shown 
in Kulkarni [18] , the noise is dominated by the so-called 
self-noise. All the elements of the (Re Re) block of the 
covariance matrix are also equal, indicating that the visi¬ 
bilities are perfectly correlated and that there is only one 
effective measurement of the source. However, the vari¬ 
ance of the difference between any two visibilities is zero. 
The difference of the two visibilities is essentially a new 
correlation product whose effective beam has a null at the 
zenith. So while the auto-correlations and thus the vari¬ 
ance of the visibilities are dominated by the source, any 
linear combination of visibilities whose effective beam does 
not include the source will have much lower noise. 

When defining the matrix C, it is necessary to note 
that there are several redundant combinations of indices 
for the correlation products. The (Reij) index is equiv¬ 
alent to the (Rej*) element, and the (Imi_)) and (Imji) 
correlation products are related by a negative sign. In ad¬ 
dition the (Im»I) correlation products are identically zero, 
carry no information, and have no noise. The rows and 
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columns of C associated with these correlation products 
should be discarded. However, after removing this redun¬ 
dancy, C is guaranteed to be at least positive semi-definite 
and, as long as the system temperature is finite, positive 
definite and thus invertible. Only non-linearity in the cor¬ 
relator (not the analog systems or ADCs) can render it 
non-invertible. 

Unfortunately, for interferometers with more than a 
few dozen elements, it is not feasible to invert the covari¬ 
ance matrix for every spectral bin and every temporal in¬ 
tegration in real time. As such, for large interferometers, 
we must fall back to using Equation [4] over Equations [5] [7] 
While not strictly accurate, Equation [4] remains an excel¬ 
lent approximation for most observational regimes. For 
CHIME, no source on the sky other than the sun increases 
the total power by more than roughly 40% at any spectral 
frequency. Since the errors from precision reduction will 
be sub-dominant to the noise by several orders of magni¬ 
tude, even order unity mis-estimations of the noise should 
have negligible impact on the final data. Nonetheless, one 
may want to compensate for this error by being extra con¬ 
servative when specifying the degree of precision reduction 
as will be discussed in the next section. 

For the remainder of this paper we will define the RMS 
noise used to calculate truncation precision as 

Socij = vVcc- 1 ) ai j, oti j 5 (14) 

with C defined in either Equation [4] or Equations [5] [7] de¬ 
pending on interferometer size. 

3.2. Rounding the data relative to the noise 

With an expression in hand for s a ij , which is our basis 
for comparison when adding numerical noise, we can pro¬ 
ceed to derive a procedure for rounding the data values. 
In this section we will drop the indices on s with the pro¬ 
cedure being understood to apply on a correlation product 
by correlation product basis. 

We will be relating s to the rounding granularity g 1 
which is the power of two to a multiple of which we will 
be rounding each data element. Note that by rounding in¬ 
stead of simply truncating, one extra bit can be discarded 
for essentially no cost since the induced error associated 
with zeroing a given number of bits is half as large. Round¬ 
ing also has the very desirable property of being unbiasecj^J 
provided the data values are randomly distributed within 
the granularity (which is an excellent approximation since 
the granularity will be much smaller than the Gaussian 
thermal noise). This is not true of truncation, where the 
bias is half the granularity. 

We define of as the added noise variance associated 
with reducing the precision of a data element. We will 
require that of < /s 2 , where / is the maximum frac¬ 
tional increase in noise from precision reduction. It can 


3 We choose the ‘round ties to even’ tie breaking scheme [15] . which 
is unbiased. 


be thought of as the effective fractional loss of integration 
time caused by reducing the precision. 

When using the approximate Equation [4] in the defini¬ 
tion of s, the approximation can be compensated for by re¬ 
ducing / by a factor of the minimum portion of total power 
originating from receiver-noise squared. To be more pre¬ 
cise, multiplying / by the minimum of 1 — VijV*j/(VaVjj) 
(for i j). roughly compensates for the approximation. 
This will be especially relevant to low-frequency compact 
arrays with many elements, for reasons mentioned above. 
This could in principle be performed dynamically as a 
function of time or spectral frequency, although no at¬ 
tempt is made to implement this. 

For randomly distributed rounding errors (which is an 
excellent approximation since the rounding will occur in 
noise-dominated bits), the rounding noise is related to the 
granularity by j TOl : 

<x 2 = g 2 /12. (15) 

Thus the maximum rounding granularity is: 

g < vW?, (16) 

Our precision-reduction scheme is to round each data ele¬ 
ment to a multiple of the largest possible power of two, g , 
subject to the constraint given in Equation [l 6 j We note 
that this equation gives an upper limit, and that on av¬ 
erage the granularity and added noise will be below this 
limit. 

For maximum generality, the set of s a ij should be re¬ 
calculated for each spectral frequency and each temporal 
integration, allowing the precision reduction to adapt to 
sky and bandpass spectral structure, temporal changes in 
the sky, and time and frequency dependant RFI. 

When using Equation [4] for the noise, the calculation of 
s a ij requires only a handful of floating point operations per 
data-element and thus has negligible cost compared to the 
initial correlation. Finding the largest integer power-of- 
two granularity that satisfies Equation [16] and then round¬ 
ing to that granularity can be performed in tens of instruc¬ 
tions with no branching. An example implementation in 
the Cython programming language is available onlin^j] 

As discussed above, performance is a greater concern 
for decompression than compression. The precision reduc¬ 
tion requires no decoding, and as such its throughput is of 
secondary concern. The example implementation is only 
lightly optimized and achieves ~ 300MiB/s throughput 
on a single core of a modern processor. This was deemed 
sufficient for CHIME’s data acquisition, although we hy¬ 
pothesize that a factor of four speed-up may be possible 
by employing the vectorized SSE instruction sets. 

The precision reduction applied to the data shown in 
Figure [l] is illustrated in the first two panels of Figure [5] 


4 https://gist.github.com/kiyo-masui/b61c7fa4fIlfca453bdd 
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Figure 1: Example visibility data for an inter-cylinder 
baseline of the CHIME Pathfinder: 96 10 s-integrations of 
a 0.39 MHz wide spectral bin at 644 MHz. 


4. Lossless compression: Bitshuffle 

Here we discuss lossless data compressors in the con¬ 
text of radio astronomical data. We seek a compressor 
that is fast enough for high performance applications but 
also obtains high compression ratios, especially in the con¬ 
text of the precision reduction discussed in the previous 
section. Satisfying both criteria is difficult and existing 
compressors are found to be inadequate. Therefore, a cus¬ 
tom compression algorithm, Bitshuffle, was developed; 
it is both fast and obtains high compression ratios, at the 
expense of being slightly less general. 

In this section we begin by reviewing popular algo¬ 
rithms, some understanding of which is necessary to mo¬ 
tivate the design of Bitshuffle. We then describe the 
Bitshuffle algorithm, its interaction with the precision 
reduction step, and its implementation. 

4-1- Brief description of popular lossless compression al¬ 
gorithms 

By far, the most common class of compression algo¬ 
rithms is the LZ77 class of encoders mm- These include 
LZ77, LZlQ LZtj^l Google’s Snapp}Q LZ^jand others. The 
LZ77 encoders compress data by searching for repeated se¬ 
quences of bytes in the uncompressed data stream. When 
a sequence that occurred earlier in the stream is found it 
is replaced by a reference to the earlier occurrence. This is 
represented by a pair of tokens representing the length of 
the sequence and the location of the previous occurrence 
as an offset from the present location. It is worth noting 
that run-length encoding, where consecutive repetitions of 
identical sequences of bytes are eliminated, is a special case 


5 http://oldhome.schmorp.de/marc/liblzf.html 
e http://www.oberhumer.com/opensource/lzo/ 

‘ https://code.google.com/p/snappy/ 

S https://code.google.com/p/lz4/ 
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Figure 2: Bit representation of the data in Figure [I] at 
various stages of compression. The data are stored as an 
array of two element structs, where each the real and imag¬ 
inary parts are represented by little-endian signed integers. 
The data are natively represented by 32-bit integers, but 
for the compactness of this figure, we divide the data by 
2 8 and use 16-bit integers. In all panels, the memory is 
laid out from left to right (in rows of 96 bits) then top to 
bottom, with black representing a set bit and white rep¬ 
resenting an unset bit. Within an 8-bit byte the bits are 
unpacked from the least significant bit to the most signif¬ 
icant bit, which is convenient for visualizing little-endian 
data types. The panels represent, from top to bottom: 

1. The original data with each row containing 3 inte¬ 
grations. 

2. Data after reducing precision with / = 0.01. 

3. Data after the bit-transpose step of Bitshuffle. 
Each column contains a single integration. 

4. Data after the LZ4 compression step of Bitshuffle. 
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and can be represented by setting the length to be greater 
than the offset. This class of encoders has the advantage 
that it can be made very fast, with some implementations 
( e.g. LZ4) achieving greater than 2GB/s decompression 
speed on a single core of a modern processor. 

The DEFLATE algorithm ,23—best known for its use 
in both the gzip and zip programs and file formats— 
includes LZ77 encoding as a first step, followed by Huff¬ 
man coding [23] . Huffman coding entails replacing the 
most commonly occurring byte-values in the uncompressed 
stream with shorter representations (less than one byte). 
Less commonly occurring values must be represented using 
a symbol that is larger than a byte. The LZ77 encoding 
and Huffman coding are synergetic as they exploit different 
types of redundancy. Spatial redundancy plays a greater 
role in LZ77 as the sequence must be an exact match to 
a previous occurrence, but can be compressed even if the 
bytes-value within the sequence are rare. The Huffman 
coding step exploits only the fact that some bytes may 
be more common than others and compresses data even 
if these bytes are randomly ordered. Due to the addi¬ 
tional Huffman coding step, DEFLATE generally achieves 
higher compression ratios than the pure LZ77 class en¬ 
coders. However the computational cost is high and, as we 
will show, DEFLATE implementations are generally roughly 
a factor of ten slower than the fastest LZ77 encoders for 
both compression and decompression. 

The above algorithms are representative of those most 
commonly used for scientific data. Notable omissions are 
bzip^Jand LZMi{^] both of which generally achieve higher 
compression ratios than DEFLATE but were deemed too 
computationally expensive for high-performance applica¬ 
tions. 

For typed binary data, where the data consist of ar¬ 
rays of elements of a fixed number of bytes, it has been 
recognized that compression is generally improved by ap¬ 
plying the byte reordering shuffle pre-filter |25j . shuffle 
breaks apart the bytes of each data element, grouping to¬ 
gether all the first (second, etc.) bytes. To put this in 
other terms, if you arrange all the bytes in the array into 
a matrix with dimensions of the number of elements by 
the size of each element, shuffle performs a transpose on 
this matrix. This improves compression ratios, primarily 
in the LZ77 step, by creating long runs of highly corre¬ 
lated bytes. This relies on consecutive values of the data 
themselves being highly correlated, but this is broadly the 
case in scientific data. An illustrative special case is in 
unsigned integer data that only spans a subset of the rep¬ 
resentable values. Any unexercised most-significant-bytes 
are grouped together into a long run of zeros and are triv¬ 
ially compressed. 

When paired with the precision reduction described in 
Section [3j it is expected that it is the Huffman coding 
that will best exploit the associated reduction of entropy 


!j http: //www. bzip. org/ 

1( http://www.7-zip.org/sdk.html 


to achieve a better compression ratio. Even when paired 
with shuffle, the precision reduction does not generically 
produce long runs of repeated bytes unless eight or more 
bits are discarded. However, the frequency of certain byte 
values (multiples of 2 nBT where ubt is the number of bits 
discarded) is greatly increased; this is a prime target for 
Huffman coding. 

4-2. Bitshuffle pre-filter and compressor 

Bitshuffle extends the concept of shuffle to the bit 
level: it arranges the bits of a typed data array into a 
matrix with dimensions of the number of elements by the 
size of each element (in bits), then performs a transpose. 
This is illustrated in Panel 3 of Figure [2] 

Bitshuffle is better able to convert spatial correla¬ 
tions into run-lengths than shuffle because it is able to 
treat correlations within a subset of the bits in a byte in¬ 
stead of only those which apply to the whole byte. It thus 
allows for the elimination of the computationally expensive 
Huffman coding step of DEFLATE in favour of the fastest 
LZ77-class compressor available (we use LZ4). The trade¬ 
off is that, because each byte now contains bits from eight 
neighbouring data elements, spatial correlations must be 
eight times longer to produce run-lengths of useful length. 
An illustration of bit-transposed data after compression is 
shown in Panel 4 of Figure [2j 

While the practice is not widely used, we do not claim 
to be the first to implement bit-transposition of data ar¬ 
rays for the purposes of data compression. In addition to 
several references to this idea scattered around the World 
Wide Web, the MAFISC compressor [25] implements bit- 
transposition as one of its pre-filters. 

It is worthwhile to briefly discuss how two’s-complement 
signed integers are compressed with Bitshuffle. In two’s 
complement, zero is represented by having none of the bits 
in the element set and -1 is represented by having all of 
them set. As such, while the values of data having zero 
crossings may be highly correlated, the bit representations 
are not. Bit-transposing such datasets does not produce 
the long runs of zeros or ones in the most significant bits. 
This can be clearly seen in Panel 3 of Figure [2] As such, 
it might be expected that such data would not compress 
well. However, while the sequence of bytes representing 
the data’s most significant bit (bottom row of Panel 3 of 
Figure [2]) may be incompressible, it is identical to the se¬ 
quence of bytes representing the second most significant 
bit (next to bottom row) and so on. As such, the block as 
a whole turns out to be highly compressible. 

f.2.1. Implementation 

The bit-transpose operation presented here is computa¬ 
tionally more expensive than the byte-transpose in shuffle 
by roughly a factor of four depending on implementation. 
However, both costs are negligible relative to DEFLATE. 
Bitshuffle implements the bit-transpose using the vec¬ 
torized SSE2 (present on x86 processors since 2001) and 




AVX2 (present on x86 processors since 2013) instruction 
sets when available. Using SSE2 instructions, the most 
computationally expensive part of the bit-transpose can 
process 16 bytes of data in 24 instructions m- Using 
AVX2 this improves to 32 bytes of data in 24 instructions. 
In the absence of these instruction sets, the bit-transpose 
is performed using an algorithm that processes 8 bytes in 
18 instructions [25] , 

For performance reasons, it is beneficial to integrate the 
lossless compressor, LZ4, directly into Bitshuffle rather 
than applying it as a sequential HDF5 filter. The idea is 
to bit-transpose a small block of data that fits comfortably 
into the Lid memory cache and then apply the compres¬ 
sor while it is still in cache. Since getting the memory 
contents in and out of the cache can be the bottleneck, 
especially when using multiple threads, this can greatly 
improve performance. Usually, compressing data in small 
blocks is detrimental to compression ratios, since the maxi¬ 
mum look-back distance for repeated sequences is limited. 
However, because compression after the bit transpose is 
trivial, this was found not to be the case for Bitshuffle. 
The default block-size in Bitshuffle is 4096bytes. 

Bitshuffle is both internally threaded using OpenMP 
and thread safe, making no use of global or static variables. 
Threading is implemented by distributing blocks among 
threads. 

Bitshuffle is written in the C programming language, 
although it has bindings in Python and is distributed as 
a Python package. In addition to routines for process¬ 
ing raw buffers, it includes an HDF5 filter which is ac¬ 
cessible in Python, can be compiled into a C program, 
or loaded dynamically using HDF5’s dynamically loaded 
filters (available in HDF5 version 1.8.11 and later). 

5. Evaluation of method 

In this section we apply the compression algorithm de¬ 
scribed above to data from the CHIME Pathfinder to as¬ 
sess the algorithm’s performance and to compare it with 
other compression schemes. The Pathfinder comprises two 
parabolic cylinders, each 20 m wide by 35 m long, with 
their axes running in a north-south direction. 64 identical 
dual-polarization feeds are located at 0.3 m intervals along 
the central portion of each focal line. 

The data used for the following comparisons was col¬ 
lected on January 25, 2015, starting at roughly 2:10AM 
PDT. Analogue signals from the CHIME antenna are dig¬ 
itized at 8 bits before being Fourier transformed into spec¬ 
tral channels m Section 5] and correlated We 

include data from 16 correlator inputs connected to four 
dual-polarization antennas on each of the two cylinders. 
The dataset includes 1024 time integrations of 21.45 s length, 
136 correlation products, and a subsample of 64 of the 1024 
spectral frequencies uniformly spanning the 400-800 MHz 
band. The data is arranged such that time is the fastest 
varying index and frequency the slowest and, as such, the 
C shape of the array is (64, 136 , 1024). Each element is 


a struct of two 32-bit, signed, little-endian integers repre¬ 
senting the real and imaginary parts of the visibility. The 
total size of the dataset is 64 x 136 x 1024 x 8 bytes = 
68 MiB. 

The data themselves have a rich set of structure, in¬ 
cluding spectral channels with either persistent or inter¬ 
mittent RFI, a malfunctioning amplifier in one of the 16 
signal chains (causing high power and noise in that chan¬ 
nel), and the transit of a bright source (the Crab Nebula) 
as well as part of the Galactic plane. Figure |T] shows a 
small subset of the dataset including the transit of the 
Crab Nebula in an inter-cylinder baseline. These data 
broadly represent the phenomena expected to occur in 
CHIME data, but statistically will differ significantly from 
the data produced by the full Pathfinder. With 256 cor¬ 
relator inputs, the data produced by the full Pathfinder 
will be much more heavily dominated by cross-correlations 
of long baselines which may have a significant impact on 
compression. 

For all the tests presented below, we use the HDF5 
data format and library to perform the compression and 
store the data. The chunk shape is chosen to be (8, 8, 
1024) which gives a total size of 512 KiB. 

5.1. Distribution of rounding errors 

First we verify that our implementation of the preci¬ 
sion reduction behaves as expected when applied to real 
data. We require that the rounding errors be unbiased, 
and that the probability distribution of errors be more 
concentrated than a top-hat function with width of the 
maximum granularity, given in Equation |16| Rounding 
errors are calculated by directly subtracting the original 
dataset from the precision reduced dataset then compar¬ 
ing with the maximum granularity. The distribution of 
these errors is shown in Figure [3] for various values of /. 

The expected probability density is a superposition of 
top hat functions with widths depending on where the 
maximum granularity falls relative to a power of two. The 
function is flat between ±1/4 since the final power-of-two 
granularity is always at least half the maximum granu¬ 
larity. As expected, we see that no rounding error ex¬ 
ceeds half the maximum granularity in absolute value. For 
/ = 10 -5 there is an excess of errors at zero as well as a 
noted jaggedness along the central plateau. This is be¬ 
cause, for a significant portion of the data, the granularity 
is unity, implying no rounding for integers. One might 
notice that the probability densities for / = 10 -5 and 
/ = 10” 3 are nearly identical. This is because these val¬ 
ues of / differ by a factor very close to a power of four 
(4 5 ) and a scaling relation guarantees identical probability 
distributions for this case. 

The requirement that the rounding be unbiased is sat¬ 
isfied if the probability densities are symmetric about zero, 
which we have verified is true to within statistical uncer¬ 
tainty from the finite sample. We have also checked for 
bias in the mean of the errors over time in each frequency 
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Figure 3: Distribution of rounding errors for different 

levels of precision reduction. Rounding errors are scaled 
to the maximum granularity, \Jl2fs 2 , and the histogram 
is normalized to integrate to unity and thus approximates 
the probability density function. 

and correlation product, as well as searched for correla¬ 
tions in the errors along the time axis, finding no evidence 
for either. 

We conclude that there is no evidence that the preci¬ 
sion reduction is behaving other than expected. Our tests 
are consistent with it increasing the noise by a fraction 
of at most /, equivalent to a fractional loss of integration 
time of /. 

5.2. Effectively compressing precision-reduced data 

Here, we assess the effectiveness of the precision reduc¬ 
tion step and evaluate which subsequent lossless compres¬ 
sion algorithms are able to exploit the associated reduc¬ 
tion in entropy. In this section, we consider three classes 
of lossless compression algorithm: the LZ77 class of en¬ 
coders represented by LZF (chosen due to the availability 
of an HDF5 filter), the DEFLATE algorithm (implemented 
in zlitf^jwith compression level 4), and Bit shut fie. For 
the LZF and DEFLATE cases, the shuffle pre-filter was ap¬ 
plied to the data which was found to improve compression 
ratios in all cases. We quote compression ratios as the ra¬ 
tio of compressed data size to original data size, expressed 
as a percentage. Results are shown in Table [l] 

It is seen that reducing the precision of the data does in 
fact improve compression in all cases. Between / = 10 -5 
and / = 1CT 2 , the precision of the data is decreased by a 
factor of V1000, or 5.0 bits. Since the data are stored as 
32-bit integers, one would optimistically hope for a 15.5% 
improvement in compression ratio between these cases. 
All three compressors do a reasonable job of exploiting 
the reduced entropy, with LZF, DEFLATE, and Bitshuffle, 


11 http://www.zlib.net/ 


Table 1: Compression ratios for various compression algo¬ 
rithms as a function of degree of precision reduction (pa¬ 
rameterized by /, defined in Section [3]). 


/ 

LZF 

DEFLATE 

Bitshuffle 

0 

69.5% 

61.2% 

59.6% 

icr 5 

46.7% 

38.5% 

37.1% 

icr 4 

45.6% 

34.1% 

32.0% 

icr 3 

44.2% 

30.6% 

27.9% 

icr 2 

37.1% 

25.9% 

22.2% 


achieving a 9.6%, 12.6%, and 14.9% increase in compres¬ 
sion respectively. LZF’s marginally poorer ability to exploit 
the reduced entropy is in line with our expectations from 
Section 14.11 

It can be seen that the compression ratio improvements 
are more uniform for Bitshuffle and DEFLATE than for 
LZF. The former compress by an extra ~ 4.5% at each 
step, while LZF sees a much better improvement between 
/ = 10“ 3 and / = 10 -2 . We speculate that this is be¬ 
cause Bitshuffle and DEFLATE effectively compress each 
bit as it is discarded, while LZF achieves most of its im¬ 
provement when the rounding passes a byte-boundary for 
some portion of the data. 

The improvements in compression from the native pre¬ 
cision case depend on how much precision is kept by the 
correlator and data collection software. The CHIME cor¬ 
relator truncates to 4 bits of precision after spectral chan¬ 
nelization, with the rest of the correlation process being 
very nearly lossless. We see that this process keeps an ex¬ 
cessive amount of precision, since compression ratios im¬ 
prove by more than 22% even when reducing the precision 
to a conservative / = 10 -5 . While the precision of the 
data could in principle be reduced explicitly during acqui¬ 
sition by right-bit-shifting the values by several places, this 
is much less controlled than comparing to the radiometer 
equation in the way presented here. 

It is worthwhile considering what compression ratio 
would be achieved if we were to simply reduce the preci¬ 
sion then use a minimum data element size. Such a scheme 
could be conveniently implemented using a custom HDF5 
data type along with the N-bit filter. The number of re¬ 
quired bits per element is given by — log(12//iV)/(21og2) 
which, for this data and for / = 10~ 3 , is roughly 15 bits. 
One more bit is required for the sign, and at least one more 
should be allowed for dynamic range (meaning the total 
power could change by a factor of two without overflow¬ 
ing), and so such a scheme would achieve a compression 
ratio just worse than 50%. We see that a mild amount 
of precision reduction coupled with lossless compression 
beats this and does not have the complications of tuning 
scaling factors nor worries about overflows during transits 
of bright sources or bursts of RFI. In such a scheme chang¬ 
ing N, which depends on the integration time and spectral 
channel bandwidth, would require a change in data type 
which is inconvenient. 
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In Section |2.2| we argued that ordering the data with 
time as the fastest varying index is beneficial for compres¬ 
sion. To verify this, we repeated the tests in this section 
using a chunk shape of (64, 32, 32). As expected, all 
compression ratios worsen. Bit shuffle is most sensitive 
to data ordering with compression ratios worsening by 6% 
(for / = 10~ 2 ) to 11% (for / = 0), compared to 3% to 4% 
for the other compressors. 

The benefits of the precision reduction are substantial. 
When compressing with Bitshuffle, reducing the preci¬ 
sion to / = 10~ 3 more than halves the final data volume. 

5.3. Throughput and compression ratios 

Here we directly compare several lossless compression 
schemes on the basis of compression ratio and through¬ 
put, fixing the lossy compression at / = 10~ 3 . In addition 
to the DEFLATE+shuf f le and LZF+shuff le schemes pre¬ 
sented in the previous section, we compare Bitshuffle 
to Bloscf 32 ] Blosc is actually a ‘meta-compressor’ which 
combines an optimized version of the shuffle pre-filter 
with a lossless compressor, using a similar blocking scheme 
as Bitshuffle to optimize the use of the Lid cache. Blosc 
supports several back-end lossless compressors. Here we 
show results for LZ4 as well as LZ4_HC. LZ4_HC is an LZ4 
derivative that uses the same compression format and de- 
compresser as LZ4 but spends much longer on the compres¬ 
sion step in attempt to achieve better compression ratios. 

Both DEFLATE and Blosc have a compression level pa¬ 
rameter, whose value can be between 1 (fastest) and 9 
(best compression) inclusive. For DEFLATE we test levels 
1 and 7 to bracket the range of throughputs and com¬ 
pression ratios that can be expected. Levels higher than 
7 were found to be excessively slow at compression while 
not significantly improving compression ratios. For the 
same reasons we test level 1 for Blosc+LZ4 and level 5 for 
Blosc+LZ4_HC. 

To give an idea of the overhead associated with reading 
and writing to the datasets, we show the throughput when 
not compressing the data using both chunked and contigu¬ 
ous storage. Finally, to give an idea of the computational 
cost of the precision reduction, we include throughputs for 
the example implementation, although we reiterate the lit¬ 
tle effort has been put into its optimization. 

Benchmarks are for a single thread on an Intel Core 
i7-3770 CPU running at 3.40GHz. Note that this pro¬ 
cessor includes support for the SSE2 instruction set but 
not AVX2. We employ the HDF5 “core” file driver, such 
that the datasets only ever exist in memory and are never 
written to disk. Thus the file system plays no role. Our 
timings are reproducible in repeated trails to within a few 
percent. Results are presented in Table [2] 

We see that Bitshuffle obtains a better compression 
ratio than all other algorithms tested and that only Blosc 


1 -http://www.blosc.org/ 


Table 2: Comparison of the of various compression algo¬ 
rithms on the basis of compression ratios and throughput. 


Algorithm 

Compression 

ratio 

Write 

(MiB/s) 

Read 

(MiB/s) 

Contiguous 

100.0% 

5065 

2576 

Chunked 

100.0% 

3325 

2572 

Rounding 

100.0% 

289 

2574 

DEFLATE -1 

32.2% 

73 

181 

DEFLATE -7 

30.3% 

23 

182 

LZF 

44.2% 

181 

245 

Blosc+LZ4 

47.3% 

528 

1348 

Blosc+LZ4_HC 

41.0% 

30 

1417 

Bitshuffle 

27.9% 

749 

1181 


outperforms Bitshuffle on read throughput. The mar¬ 
gin by which Bitshuffle outperforms the other compres¬ 
sors is considerable, producing compressed data two thirds 
the size of the next best ‘fast enough’ (by our require¬ 
ments in Section 2.2) compressor, Blosc. In addition, 


Bitshuffle compresses faster than any compressor tested, 
which, while not being a design consideration, is a nice 
bonus. It is not clear why Bitshuffle compresses faster 
than Blosc+LZ4, since they use the same back end com¬ 
pressor, use similar block sizes and the shuffle pre-filter 
is much less computationally intensive than Bitshuff le’s 
bit transpose. 

For reading, the HDF5 overhead is significant. Both 
Blosc and Bitshuffle are within a factor of two of achiev¬ 
ing the throughput limits from HDF5. Put another way, 
the HDF5 overhead accounts for more than half the total 
time required to read data with these compressors. How¬ 
ever, these speeds are all fast compared to hard-drive read- 
throughputs. 


6. Summary and conclusions 

We have presented a high-throughput data compres¬ 
sion scheme for astronomical radio data that obtains a 
very high compression ratio. Our scheme includes two 
parts: reducing the precision of the data in a controlled 
manner to discard noisy bits, hence reducing the entropy 
of the data; and the lossless compression of the data using 
the Bitshuffle algorithm. 

The entire compression algorithm consists of the fol¬ 
lowing steps, starting with the precision reduction: 

1. Estimate the thermal noise on a data-element by 
data-element basis using Equation [14] in conjunction 
with the noise covariance matrix defined in either 
Equation [4] or Equations [5}{7] 

2. Choose an acceptable fractional increase in noise vari¬ 
ance /; we recommend 10 -5 > / > 10 -2 . 

3. Round all data to a multiple of the largest possible 
power of two subject to the limit on rounding gran¬ 
ularity given in Equation [16] 
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Followed by the lossless compression: 

4. Rearrange the bits within blocks of data by arrang¬ 
ing them in a matrix with dimensions of the number 
of elements by the size of each element (in bits), then 
perform a transpose. 

5. Compress with a fast lossless LZT7 style compressor. 

The lossless compression is implemented and distributed as 
the Bitshuff le software package which includes an HDF5 
filter for the algorithm and bindings for the C and Python 
programming languages. 

We reiterate that the precision reduction and Bit- 
shuffle lossless compression steps are independent. They 
work very well together, however, we have shown that sev¬ 
eral commonly available lossless compression algorithms 
are able to exploit the reduction of information entropy 
associated with precision reduction. In addition, we have 
shown that Bitshuffle performs very well compared to 
other lossless compressors with and without the precision 
reduction. 

The algorithms in this paper use integers, since the 
CHIME experiment produces and records visibility data 
in integer representation. However, most of the afore¬ 
mentioned procedures and conclusions apply equally well 
to floating-point data. Reducing the precision of floating 
point numbers entails adapting the rounding procedure to 
act only on the significand, and Bitshuffle works for any 
data type with no modifications. It is not expected that 
floating point numbers will compress as well as integers 
under this scheme, especially for data with frequent zero 
crossings. In such data there can be large fluctuations 
in the exponent, inhibiting compression. This can espe¬ 
cially affect interferometer data where cross correlations 
over long baselines are often zero within noise. 

In addition to our compression scheme we present the 
following considerations for developing a data storage for¬ 
mat for high performance applications: 

• Because decompression can be fast compared to read¬ 
ing data from disk, compression can improve 10 per¬ 
formance. 

• Compression and HDF5 chunking should not reduce 
the speed of random access data reads for files stored 
on hard disks since the disk seek time is generally 
longer than the time required to read and decom¬ 
press the data. 

• Data are generally read more often than they are 
written, and as such read performance should be pri¬ 
oritized over write performance. For this reason, a 
post-acquisition data reordering step can be worth¬ 
while. 

• Data consumer usage patterns should be the pri¬ 
mary consideration when deciding a data layout. For 
CHIME this means having time as the minor (fastest 
varying) axis. This is also beneficial for compression 


ratios when using Bitshuffle since time is the axis 
over which the data are most highly correlated. 

We have shown that when applying our compression 
scheme to data produced by the CHIME experiment, the 
data are compressed to 28% of their original size which in 
many cases will improve read performance by over a fac¬ 
tor of three. In addition we have shown that Bitshuffle, 
when applied to CHIME data, outperforms all compression 
algorithms tested in both throughput and compression ra¬ 
tios. 

The benefits of our compression scheme are substan¬ 
tial with essentially no drawbacks. The CHIME exper¬ 
iment is employing the algorithm in a post-acquisition 
data re-ordering and compression step that creates our 
final archive files. As radio datasets continue to grow in 
size, more instruments will need to employ compression 
to keep these datasets manageable. We expect that our 
scheme is broadly applicable to most post-correlation ra¬ 
dio data, and that aspects of it could benefit many current 
and future instruments should the change in data format 
be deemed worthwhile. 
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